Data transfer method and system for loudspeakers in a digital sound reproduction system

ABSTRACT

The present publication describes a data transfer method and system in a digital sound reproduction system. The method comprises method steps for generating a digital audio stream for multiple channels in a host data source, e.g. a computer, the audio stream is formed by multiple consecutive samples, receiving the digital audio stream sent by the host data source through a digital data transmission network by several digital receivers each of which including a microcontroller with a clock, the receivers further including means for generating an audio signal. In accordance with the invention the host data source sends repeatedly a synchronization sample to at least one receiver, the receiver replies to the synchronization sample by a return sample, the host calculates a latency (T) for each receiver based on the sending time (Th 1 ) of the synchronization sample and the reception time (Th 2 ) of the return sample and the processing time (Tt 1 -Tt 2 ) of the receiver, the host sends to the receiver information of the calculated latency (T) in combination with the time stamp the measurement time, based on this information the receiver adjusts the function of its clock, and the above synchronization steps are repeated continuously.

FIELD OF INVENTION

The present invention relates to a data transfer method according to thepreamble of claim 1.

The invention also relates to a data transfer system.

BACKGROUND OF INVENTION

According to the prior art, there are several commercial system fordigital audio reproduction in digital networks. For example followingproducts are available today. The Gibson MaGIC™ network Cobra Net™,EtherSound™, Livewire™, MADI™ and others describe systems by which audiodata may be streamed to digital loudspeakers or sound reproductionsystems. Basically the quality of the reproduction in these systems isvery good for home use but for professional use the digital transfertechnology causes some problems.

In accordance with the prior art the above problem has been solved bybuffering the information into receivers and controlling the unloadingof the information from the receivers.

In more detail, to synchronize clocks over Ethernet connections theexact travel time of network packets must be measured. This is difficultfor two reasons. First, standard network socket API will introducerandom latency between calling the user-mode send-function and theactual output of the packet depending on the status of the operatingsystem. The same applies also to reception of packets, the time betweenreception of packet from the network and its indication to user-modeprocess listening to the UDP socket cannot be accurately determined.

Secondly, when packet travels through network it will go through one ormore hubs, switches or routers. Each device may randomly delay packetsdepending on the load of network and state of the device. Thisintroduces random latency in travel time that cannot be predicted. Whenmeasured, it is found that the latency is nearly constant for most ofthe packets but some packets may be delayed by several hundreds ofmicroseconds or even more.

SUMMARY OF INVENTION

The invention is intended to eliminate some defects of the state of theart disclosed above and for this purpose create an entirely new type ofmethod and apparatus for data transfer in a sound reproduction system.

The invention is based on implementing network packet time stamping innetwork protocol stack so that accurate time for send and receipt ofpackets can be determined. In a preferred embodiment the receiversoftware implements the time stamping directly in the Ethernet driver(for which we have source code) for the most accurate operationpossible.

The second problem is preferably solved simply by running the clocksynchronization, which includes determination of round-trip time betweenhost and receiver, and performing the synchronization only if thelatency is within acceptable range from measured minimum latency.

More specifically, the method according to the invention ischaracterized by what is stated in the characterizing portion of claim1.

The system according to the invention is, in turn, characterized by whatis stated in the characterizing portion of claim 6.

Considerable advantages are gained with the aid of the invention.

The present invention is especially suitable for multi channel soundreproduction systems, where along the same data transfer path is sent adata stream including audio information of multiple audio channels to bereproduced simultaneously in several loudspeakers.

With the aid of the method according to the invention, a statisticallatency time may be defined in a start-up procedure and use this valueas a reference latency time for further, continuous latency measurement.

By these two methods the audio reproduction system may adapt to the loadof the network and make suitable adjustments in order to maintain highquality and synchronized multi-channel audio reproduction in most of theload variation cases.

In the following, the invention is examined with the aid of examples andwith reference to the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows a block diagram a digital audio system, which can be usedin connection with the present invention.

FIG. 2 shows as a block diagram one network management host system inaccordance with the invention.

FIG. 3 shows as a block diagram one receiver management system accordingto the invention.

FIG. 4 shows as a timing diagram a method in accordance with theinvention.

FIG. 5 shows as a timing diagram a method in accordance with theinvention.

FIG. 6 shows as a flow chart a synchronization protocol in the receiverin accordance with the invention.

FIG. 7 shows as a flow chart a synchronization protocol in the host inaccordance with the invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENT

In the invention, the following terminology is used in connection withthe reference numbers. However, the list is not exhaustive especiallyrelating to the block and flow diagrams of FIGS. 7-11:

-   1 host or host data source-   2 receiver, digital loudspeaker,-   2 a wireless receiver-   3 switch, network-   4 group of receivers-   10 hard disc-   12 virtual software audio adapter (driver)-   13 audio data manager-   14 synchronization manager-   15 network interface-   16 network timestamping-   17 system clock-   20 network interface-   22 timer hardware-   23 adjustable oscillator-   24 loudspeaker networks communications-   25 synchronization controller-   26 digital to analog conversion-   27 audio stream controller-   28 data output controller-   29 sample rate converter-   60 synchronization signal/ECHO REQ-   61 Return message/ECHO RESP-   62 Control Command/SET CLOCK-   150 Wireless Local Area Network (WLAN) access point

Also the following acronyms and abbreviations are used in the followingtext.

DHCP Dynamic Host Configuration Protocol

FEC Forward Error Correction

GLM Genelec Loudspeaker Manager

Global LSNW Multicast address:port to which all global LSNW traffic issent. All address receivers listen to this address to receive DISCOVERY,ANNOUNCE, GROUP and other global messages

Group LSNW Multicast address:port to which all data directect to set ofgrouped address receivers is sent. All receivers that are assigned tosame group listen to same group address. Group address will receiveclock synchronization messages, streamed audio and glm control messages.

Host Application that manages the loudspeaker network, streams audio andsend glm-control messages.

IP Internet Protocol

LSNW Loudspeaker Network

Multicast A special IP address that will be routed to members of amulticast address group.

Receiver Processor, network interface and the software that connects aloudspeaker to IP-network

UDP User datagram protocol

Further, in this application latency means the network delay between twonetwork elements for a data sample.

In accordance with FIG. 1 the system in accordance with the inventioncomprises at least one host computer 1 or host data source forcontrolling the system and several receivers 1 connected to the hostcomputers 1 via en digital network 3 comprising the signal path 3 formedby cables, connectors, network adapters and switches etc.

In other words the LSNW (Loudspeaker network) system consists of one ormore hosts 1 that each manage sets of receiver devices 2. Hosts 1 act assource of management, control and audio data to the receivers 2. Hosts 1are responsible for discovering receivers 2 connected to IP-network,managing groups 4 of receivers and providing them with audio. Receivers2 respond to commands and playback audio data from hosts 1.

In accordance with FIG. 2 the host system comprises typically hard disc10 by which Digital audio data may be stored. Also some othernon-volatile medium like flash memory can be used. Digital audio datamay be acquired from virtual software audio adapter (driver) 12 thatredirects audio to networked loudspeakers. Audio data manager 13acquires digital audio data and makes it suitable for streaming.Streaming and synchronization manager 14 controls clock synchronizationof loudspeaker devices (receivers) currently controlled by the host.Network interface 15 connects the host to computer communicationsnetwork. Network timestamp-module 16 manages accurate timing ofsynchronization related network traffic. This is required to reduceeffects of random latencies introduced by the non real-time operatingsystem (such as Windows, Linux etc.) run by the host. System clock 17provides accurate time information used by the synchronization managerand a standard Ethernet network 3 enables IP-based communicationsbetween the host and the receivers.

Host application manages the loudspeaker network, routes managementinformation from GLM and audio from audio software to receivers. Hostapplication will run as a background daemon process on the hostcomputer. On windows platform, these background processes are usuallyreferred as services or system services.

Host provides interface for GLM software to send and receiveGLM-messages to receivers as if the GLM Software was using GLM network.

Host software will provide standard audio interface for audio softwareto send audio to LSNW receivers. Such interfaces are for example ASIOand Windows audio. The audio software will see LSNW receivers aschannels in virtual audio interface provided by the host.

Host will include proprietary kernel-mode driver software to providenecessary virtual audio interface and UDP Network interface 20 connectsthe receiver to communications network 3. Timer hardware 22 providestime information for the system clock and synchronization controller.Adjustable oscillator 23 provides clock signal for timer hardware andaudio data output controller 28. Loudspeaker networks communicationsmodule 24 manages network traffic to and from host computer.Synchronization controller 25 synchronizer receivers clock with host. Itadjusts clock oscillator in order to minimize clock drift betweenreceiver and host clocks. Digital signal processing, digital-to-analogconversion takes place in block 26. Audio stream controller 27 managesaudio data received from host and feeds it to audio data outputcontroller 28. Audio data output controller 28 outputs audio data atrate specified by adjustable oscillator 23. This guarantees that sampleswill be output at same rate as host outputs them. Sample rate converter29 converts digital audio to internal sample rate used by digital signalprocessing and digital-to-analog conversion.

Accurate clock synchronization is essential for correct working of theLSNW. The LSNW protocol has mechanism for clock synchronization thatenables synchronization of host and receiver clock within accuracy ofabout 10-20 microseconds.

The solution to the travel time (latency) measurement is to implementnetwork packet time stamping in network protocol stack so that accuratetime for send and receipt of packets can be determined. In windows hostsoftware the time stamping is implemented as an IP Packet Filter thatexamines incoming and outgoing UDP-packets and record time stamps ifpacket is destined to or originates from an LSNW receiver. This locationis not optimal for time stamping, as the time stamps should be collectedas near the network hardware as possible, but experience shows that timestamping at the IP Packet Filter lever gives good accuracy.

The receiver software in accordance with the invention implements thetime stamping directly in the Ethernet driver for the most accurateoperation possible. For this purpose a source code has been developed inconnection with the invention.

The problem of random variation of network latency can be solved simplyby running the clock synchronization, which includes determination ofround-trip time between host and receiver, and performing thesynchronization only if the latency is within acceptable range frommeasured minimum latency.

Clock synchronization is initiated by the host in accordance with FIG.4. The host 1 will synchronize clocks with each group member in around-robin fashion to guarantee all receivers have accurate time. Areceiver may send SYNCH REQUEST message to host if it feels a need toresynchronize its clock. This can happen for example if receiver mustinterrupt audio stream due to packet loss and continues it when audiopackets are received.

When a receiver 2 is assigned to a group, the host will send severalECHO REQ packets 60 to receiver to probe the roundtrip latency. Thereceiver 2 will reply with ECHO RESP 61 and the host 1 will thendetermine roundtrip latency Tt₁-Tt₂ for each transaction. Once theroundtrip latency Tt₁-Tt₂ is determined with adequate accuracy, the host1 will set the minimum acceptable roundtrip for successfulsynchronization. The latency will also change as the function of packetsize, so the latency is probed for packets of different sizes.

The actual roundtrip latency is measured as follows:

-   -   1. Send ECHO REQ 60 to receiver 2 (add extra payload to increase        packet size if necessary, receiver will not process the extra        payload as it is used only to change actual UDP datagram size to        determine latency for different packet sizes)    -   2. Get timestamp TSsend (Th₁) for the packet containing ECHO REQ        60 from timestamp driver    -   3. Receive ECHO RESP 61 from the receiver, it will contain        receiver ProcessingLatency, which is amount of microseconds        receiver spent between receipt of the ECHO REQ 60 and sending of        ECHO RESP 61    -   4. Get timestamp TSrecv (Th₂)for the ECHO RESP 61 packet.        Timestamp is formed by the Host 1    -   5. Roundtrip latency is TSrecv—TSsend—ProcessingLatency

Actual clock synchronization starts like the request—responsetransaction in initialization phase. Host sends an ECHO REQ 61 andreceiver replies with ECHO RESP 61.

ECHO RESP 61 packet contains two values, receivers clock at the time Tt₁of receipt of ECHO REQ 60 packet and ProcessingLatency, the time spentby receiver between receipt of ECHO REQ 60 and sending of ECHO RESP 61.

The host 1 will calculate the roundtrip latency as is initializationphase and if the latency is below the maximum acceptable valuedetermined in initialization, host sends the CLOCK SET message 62 toreceiver 2 that contains an estimate of hosts clock at the time receiverreceived the ECHO REQ 61 packet. The estimated time is calculated byadding half of the measured roundtrip time to time of outputting theECHO REQ 61 packet.

The protocol assumes that the network latency from host to receiver isequal to latency from receiver to host. This is usually the case, butthe roundtrip will become unsymmetrical when ECHO REQ is appended toaudio data as packet that contains ECHO REQ and audio data is muchlarger than the response packet that contains only ECHO RESP. Thisunsymmetry can be compensated by appending extra data to ECHO RESP tomake the response packet same size as the request. In real applications,the unsymmetry of network packet sizes does not have very large effecton the actual result of the synchronization. The effect of unsymmetricnetwork latency to offsets between host and receiver clocks can becalculated as follows (for simplicity, the calculation does not includeprocessing latency):

-   -   1. Host clock after synchronization will be        Th₁+L_(ht)+L_(th)+L_(ht)(=Host time at start+latency of ECHO        REQ+latency of ECHO REPLY+latency of CLOCK SET)    -   2. Receiver clock at the end of synchronization will be        Th₁+(L_(th)+L_(ht))/2+L_(th)+L_(ht) (SET CLOCK time+latency of        ECHO REPLY+latency of CLOCK SET)    -   The difference of clock will be        Th₁+L_(ht)+L_(th)+L_(ht)−(Th₁+(L_(th)+L_(ht))/2+L_(th)+L_(ht))=(L_(th)−L_(ht))/2

Further in more detail, in accordance with FIG. 5 host and receiverclocks have 2 second offset at host time 10.000000 s (Th₁). Synchprotocol packet latencies are 0.000160 s from host to receiver (Th₁-Tt₁)60, 0.000180 s from receiver to host 61 (Tt₂-Th₂). This will result in0.000010 s clock offset at the end of synchronization, assuming thatreceiver's clock does not significantly drift from host clock betweentarget time 12.000160 and 12.000710. Receiver 2 may also correctfrequency of its clock based on the measured offsets and reduce averageerror between target and host clocks.

Since the network latencies (0.000160 s and 0.000180 s) were not equal,host and target clocks will have offset of(0.000180−0.000160)/2=0.000010 at the end of synchronization (Tt₃).

In accordance with FIG. 6 at start 90 receiver initializes hardware,possibly acquires IP address via DHCP and enters Idle state. In block 91the receiver receives SET GROUP command from host. The message containsIP address of multicast group to which all the loudspeaker group relatedtraffic is sent. The message also contains information on which channelof multi-channel audio the receiver is to output to digital-to-analogconversion. Receiver starts to listen to the multicast address. It alsosends message to host and acknowledges that the receiver has entered thegroup. Receiver enters state 92, RUNNING. At running state 92 receiverwill receive message directed to loudspeaker group multicast IP address.Audio data is entered into play queue and eventually output todigital-to-analog conversion. If receiver receives REQUEST TIMESTAMPmessage it enters state 97, SEND TIMESTAMP TO HOST.

If receiver receives SET CLOCK message it enters state 93. In block 93validity of new clock value is determined based on current time,estimate of clock drift between host and receiver and time intervalsince last SET CLOCK message. If the new value appears invalid (due tolarge processing latency in host or some other reason), receiver clockis not set and control returns to state 92, RUNNING. If the new clockvalue appears valid, state 94, ADJUST OSCILLATOR, is entered. Controlvoltage to adjustable oscillator is set in block 94 based on themeasured drift and between host and receiver clocks and the currentcontrol voltage. In block 95, if the measured clock offset betweenreceiver and host is less than the duration of specified number ofsamples, state 92, RUNNING, is entered. In block 96, if the measuredclock offset between receiver and host is more than the durationspecified number of samples, adjust clock value by multiple of sampledurations. At the same time add or remove samples to/from the audiostream to compensate for the clock adjustment. After the adjustment,return to state 92, RUNNING. Further in block 97 is sent TIMESTAMPmessage containing current receiver clock value and processing latencyto host.

In accordance with FIG. 7, in block 100 host application is started. Itqueries network for available receiver loudspeakers and enters IDLEstate. In block 101 Host application receives command from userinterface to setup a receiver loudspeaker group. It starts analyzingnetwork latency to each loudspeaker. In block 102, if analysis was notsuccessful report error to user and return to IDLE state. Analysis maynot succeed for example if the packet loss in the network is too large.If analysis of network latencies to each receiver is successful storemaximal acceptable synchronization network latency for each receiver andenter state 103, RUNNING.

In running state 103 the system periodically synchronizes receiverloudspeaker clocks. In block 104 timestamp request is sent to receiver.If reply is not received within given period, the system returns torunning state and retries the synchronization. If the synchronizationfails several times consecutively, the system marks receiver loudspeakeras inactive and removes it from the group of active receivers. IfTIMESTAMP is received from receiver, system enters to state 105. Inblock 105 the system determines network latency for the synchronizationtransaction. If it is above the maximum acceptable synchronizationnetwork latency determined in 101, the system enters state 108. If thelatency is below acceptable maximum, the system enters to state 106. Instate 106 system sends SET CLOCK message to receiver. In block 107, iftime since last latency analysis is below given threshold value, thesystem enters to state 102, RUNNING. If time elapsed since last analysisis too large, the system reanalyzes network latency to receiver todetect if network latency has been permanently reduced by entering tostate 101. In block 108, if more than given number of consecutivesynchronization transactions have network latency larger than theacceptable maximum, the system performs latency analysis in order todetermine permanent growth of network latency.

According to one embodiment of the invention, the proposedsynchronization method can principally be utilized also in wirelessaudio applications, said, wireless loudspeaker systems.

Due to the lower transfer rate and delays introduced by the media accesscontrol of standard wireless networks, such as 802.11 a/b/g, the networklatencies are considerably larger than in wired 100 Mbps or 1000 MbpsEthernet networks. The synchronization protocol can adapt to thisincreased latency as it analyses networks behavior during the setupphase.

Standard wireless networks also introduce random latency in order toprevent collisions during packet transmissions. These random delays makethe synchronization in wireless networks more difficult that in Ethernetbased wired networks. The effects of said random delays can be reducedby selecting the acceptable maximum network latency using more strictpercentage value than when operating in wired networks. If percentage of30% is used instead of 90%, only transactions with less random delaywill be used for clock synchronization. This modification means thateach clock synchronization requires on average 3 ECHO REQUEST/ECHO REPLYtransactions before acceptable values are acquired for SET CLOCKcommand.

Wireless networks also typically have much larger packet loss thanEthernet-based wired networks due to radio interference and collisionsduring packet transmissions. To reduce the effects of packet loss aForward Error Correction (FEC)—encoding may be used to add redundancy intransmitted audio data. This redundancy may be used by receiver toreconstruct the audio packets lost by the network.

1. A data transfer method in a digital sound reproduction system, saidmethod comprising the steps of; generating a digital audio stream formultiple channels in a host data source, the audio stream being formedby multiple consecutive samples, receiving the digital audio stream sentby the host data source through a digital data transmission network byseveral digital receivers each of which including a microcontroller witha clock, the receivers further including means for generating an audiosignal out of the digital audio stream, initiating synchronization, bythe host data source, of the receivers by sending repeatedly asynchronization sample to each of the receivers, wherein each receiverreplies to the synchronization sample by a return sample, calculating,by the host data source, a latency (T) for each receiver based on thesending time (Th₁) of the synchronization sample and the reception time(Th₂) of the return sample and a processing time (Tt₂-Tt₁) of thereceiver, which processing time (Tt₂-Tt₁) is the time the receiver spentbetween receipt of the synchronization sample and sending of returnsample, sending, by the host data source, to each receiver informationon an estimate of the clock of the host data source at the time thereceiver received the synchronization sample, adjusting the function ofits clock for each receiver based on said information, and continuouslyrepeating the above synchronization steps.
 2. A method according toclaim 1, wherein the digital audio stream is transmitted wirelessly tothe receiver.
 3. A method according to claim 1, wherein the receivercompensates for the clock difference by setting the local clock rate inorder to obtain synch of the microcontroller of the receiver.
 4. Amethod according to claim 1, wherein the host data source compares thecalculated latency (T) with a reference latency and if the calculatedlatency (T) is larger than the reference latency, no adjustmentinformation is sent to the receiver and the host data source starts aroutine to redefine the reference latency.
 5. A method according toclaim 4, wherein the clock difference is compensated for by adding orremoving samples to/from the audio data stream and adjusting clock valueaccordingly.
 6. A data transfer system for a digital sound reproductionsystem, said data transfer system comprising; a host data source forgenerating a digital audio stream for multiple channels, the audiostream being formed by multiple consecutive samples, a transmission pathfor the host data source, multiple digital receivers capable tocommunicate over the transmission path with the host data source, thereceivers including a means for receiving the digital audio stream sentby the host data source a microcontroller with a clock, and a means forgenerating an audio signal out of the digital audio stream, wherein thehost data source has means for initiating synchronization of thereceivers by sending repeatedly a synchronization sample to each of thereceivers, wherein each receiver has means for replying to thesynchronization sample by a return sample, wherein the host data sourcefurther includes means for calculating a latency (T) for each receiverbased on the sending time (Th₁) of the synchronization sample and thereception time (Th₂) of the return sample and a processing time(Tt₂-Tt₁) of the receiver, which processing time (Tt₂-Tt₁) is the timethe receiver spent between receipt of the synchronization sample andsending of return sample, sending to the each receiver information on anestimate of the clock of the host data source at the time the receiverreceived the synchronization sample, whereby based on this informationeach receiver includes means for adjusting the function of its clock,and the system includes means for repeating the above synchronizationsteps continuously.
 7. A system according to claim 6, wherein itincludes means for transmitting the digital audio stream wirelessly tothe receiver.
 8. A system according to claim 6, wherein the receiverincludes means for compensating for the clock difference by setting theclock frequency of the microcontroller of the receiver.
 9. A systemaccording to claim 6, wherein the host data source includes means forcomparing the calculated latency (T) with a reference latency and if thecalculated latency (T) is larger than the reference latency, noadjustment information is sent to the receiver and the host data sourcestarts a routine to redefine the reference latency.
 10. A systemaccording to claim 6, wherein the system includes means for compensatingfor the clock difference by adding or removing samples to/from the audiodata stream.
 11. A synchronization method in a digital soundreproduction system comprising the steps of; generating a digital audiostream in a host data source, the audio stream is formed by multipleconsecutive samples, receiving the digital audio stream sent by the hostdata source through a digital data transmission network by severaldigital receivers each of which including a microcontroller with aclock, the receivers further including means for generating an audiosignal out of the digital audio stream, whereby the receivers aregrouped in a predetermined manner, initiating, by the host data source,synchronization of the receivers by sending repeatedly a synchronizationsample to all receivers of a group, replying, by the receivers, to thesynchronization samples by return samples, calculating, by the host datasource, a latency time (T) for each sample and each receiver based onsending time (Th₁) of the synchronization sample and the reception time(Th₂) of the return sample and a processing time (Tt₂-Tt₁) of thereceiver, which processing time (Tt₂-Tt₁) is the time the receiver spentbetween receipt of the synchronization sample and sending of returnsample, and statistically forming a reference latency value, by the hostdata source, based on the calculated latency times (T).
 12. A methodaccording to claim 11, wherein the digital audio stream is transmittedwirelessly to the receiver.
 13. A method according to claim 11, whereinthe reference latency is set such that at least 80% of the measured andcalculated latency values are below the reference latency.
 14. A methodaccording to claim 11, wherein the reference latency is set such that atleast 50% of the measured and calculated latency values are below thereference latency.
 15. A method according to claim 11, furthercomprising the steps of: I Sending ECHO REQ to receiver, II Gettingtimestamp TSsend for the packet containing ECHO REQ from timestampdriver, III Receiving ECHO RESP, whereby it will contain receiverProcessingLatency, which is amount of microseconds receiver spentbetween receipt of the ECHO REQ and sending of ECHO RESP IV Gettingtimestamp TSrecv for the ECHO RESP packet V Calculating Roundtriplatency is TSrecv—TSsend—ProcessingLatency